Skip to content

Conversation

@thomasw21
Copy link
Member

@thomasw21 thomasw21 commented Sep 16, 2021

Test were failing because my local script to run tests set some env variables to mimick distributed setting.

I suggest using Hide whitespace changes to review the code.

@thomasw21 thomasw21 requested a review from stas00 September 16, 2021 19:53
@thomasw21 thomasw21 merged commit 48dcee4 into bigscience-workshop:main Sep 16, 2021
@stas00
Copy link
Contributor

stas00 commented Sep 16, 2021

Perfect! thank you, Thomas.

ofirpress pushed a commit to ofirpress/Megatron-DeepSpeed that referenced this pull request Sep 23, 2021
* Prevent deepspeed from searching for mpi

* Try something out

* Try setting value directly in test
SaulLu added a commit to SaulLu/Megatron-DeepSpeed that referenced this pull request Sep 24, 2021
@stas00
Copy link
Contributor

stas00 commented Oct 7, 2021

Our CI occasionally still runs and it just reported a failure on:
https://github.com/bigscience-workshop/Megatron-DeepSpeed/runs/3831712274?check_suite_focus=true

FAILED tests/test_model.py::MyTestCase::test_gpt - Failed: Timeout >300.0s

It timed out in init:

=================================== FAILURES ===================================
_____________________________ MyTestCase.test_gpt ______________________________

self = <test_model.MyTestCase testMethod=test_gpt>

    def test_gpt(self):
        """Test causal invariance, ie past token don't depend on future tokens."""
        command_args = get_default_args()
    
        with patch('sys.argv', flatten_arguments(command_args)):
            with mockenv_context(**self.dist_env_1_gpu):
                deepspeed.init_distributed()
>               initialize_megatron()

Not sure if it's related to earlier failures in codecarbon reported in the log file:

stderr: AttributeError: 'OfflineEmissionsTracker' object has no attribute '_start_time'
stderr: [codecarbon WARNING @ 20:26:32] graceful shutdown. Exceptions:
stderr: [codecarbon WARNING @ 20:26:32] stopping.
stderr: [codecarbon WARNING @ 20:26:32] <class 'Exception'>
stderr: Traceback (most recent call last):
stderr:   File "/usr/local/lib/python3.8/dist-packages/codecarbon/core/util.py", line 13, in suppress
stderr:     yield
stderr:   File "/usr/lib/python3.8/contextlib.py", line 75, in inner
stderr:     return func(*args, **kwds)
stderr:   File "/usr/local/lib/python3.8/dist-packages/codecarbon/emissions_tracker.py", line 347, in stop
stderr:     if self._start_time is None:
stderr: AttributeError: 'OfflineEmissionsTracker' object has no attribute '_start_time'

Perhaps we should remove it completely from our test suite for now. So far it only causes noise and crashes.

adammoody pushed a commit to adammoody/Megatron-DeepSpeed that referenced this pull request Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants